Skip to content

compiler-rt: Introduce runtime functions for emulated PAC. #133530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

pcc
Copy link
Contributor

@pcc pcc commented Mar 28, 2025

The emulated PAC runtime functions emulate the ARMv8.3a pointer
authentication instructions and are intended for use in heterogeneous
testing environments. For more information, see the associated RFC:
https://discourse.llvm.org/t/rfc-emulated-pac/85557

pcc added 2 commits March 28, 2025 15:33
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
Copy link

github-actions bot commented Mar 28, 2025

✅ With the latest revision this PR passed the undef deprecator.

* Header File
* Copyright (C) 2012-2023 Yann Collet
*
* BSD 2-Clause License (https://www.opensource.org/licenses/bsd-license.php)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a license different from Apache-2.0 WITH LLVM-exception.
Therefore, the process described at https://llvm.org/docs/DeveloperPolicy.html#copyright-license-and-patents should be followed to check whether this is acceptable in this specific case.

That being said, xxhash is already present under llvm/lib/Support/xxhash.cpp, as you pointed out in the RFC.

Making sure we don't have multiple copies of non-Apache-2.0 WITH LLVM-exception code would be preferable. I'll tag @beanz, as he had ideas about how to better structure vendored third party code in LLVM.

I'm not sure if moving non-Apache-2.0 WITH LLVM-exception licensed code to a run-time library (for the first time?) triggers new concerns.

I'll note that other hashing algorithms, such as Blake3 and SipHash, which are available under the Apache-2.0 WITH LLVM-exception, are also already present in the LLVM Support library.
Would one of these preferably-licensed hashing algorithms be a good fit for the hashing functionality needed for this use case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will contact [email protected] about this new usage.

I don't have a strong opinion about the hash algorithm. XXH3_64 was chosen for these reasons:

  1. It was the easiest to incorporate into compiler-rt as it was already available as a header-only library without dependencies.
  2. It was already being used in LLVM, so I imagined that licensing wouldn't be as much of a concern.
  3. It was faster than the other algorithms according to the xxhash homepage. BLAKE3 isn't listed there but we can extrapolate from BLAKE3's claim of being ~7x faster than BLAKE2. (I know I said elsewhere that performance is less of a concern for these functions, but all other things being equal, we may as well go with the algorithm with the best performance.)

If the board decides not to allow use in compiler-rt it should be possible to switch to one of the other algorithms after changing its code as needed to remove dependencies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The introduction of BSD-licensed code to compiler-rt (and especially to compiler-rt's builtins library) poses significant licensing difficulty.

The BSD 2-clause license requires attribution when code is distributed in object form as well as source form, and since the compiler-rt builtins are linked into binaries built with LLVM (not just LLVM itself), the impact of the license here would be transitive to products built using LLVM-based compilers.

This is extremely undesirable, and as such this change will need to replace the xxhash with an alternative hash (ideally under the LLVM license) in order to be integrated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, I've uploaded a new change that replaces the hash with SipHash.

Unfortunately SipHash is subtantially slower than xxhash and led to a ~3.5x slowdown (vs native PAC instructions) on the benchmark cited in my original RFC (and BLAKE3 isn't header only which would make it substantially harder to integrate here). Hopefully the performance hit doesn't turn out to be a problem here.

@@ -0,0 +1,115 @@
#include <stdint.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new file needs a license header, I guess?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I forgot to add one. Will add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

uint64_t __emupac_autda_impl(uint64_t ptr, uint64_t disc) {
if (pac_supported()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth writing a lazy ifunc-like resolver for this so it doesn't have to be checked on every call. Something like:

typedef BOOL (*fptr_ty)(void);

static BOOL resolver(void);

// Should be signed with address diversity
fptr_ty __emupac_autda_impl = resolver;

static BOOL resolver(void) {
  if (pac_supported())
    __emupac_autda_impl = __emupac_autda_pac;
  else
    __emupac_autda_impl = __emupac_autda_nopac;

  return __emupac_autda_impl();
}

If we had FMV support for PAC I'd suggest __attribute__((target_version("pac"))). cc @labrinea

You might also want to consider giving __emupac_autda a preserves-none calling convention, so you don't have to save/restore around them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with using ifuncs here is that this function will itself be called from ifunc resolvers (see this), so that could lead to order of initialization issues.

Since performance is not that important for this function's use case I didn't try to avoid checking every time or experiment with other calling conventions. Also, if compiler-rt needs to be buildable with non-Clang compilers, I'm not sure that we can use the other calling conventions anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of calling convention to use will probably affect register allocation significantly in the caller functions. I wonder if it could hide some subtle codegen issues. On the other hand, it may uncover other ones :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't noticed codegen issues like this in our internal testing (other than the performance hit which is expected).

Copy link
Collaborator

@llvm-beanz llvm-beanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot introduce BSD-licensed code to compiler-rt. Please see my comment for more details.

pcc added 2 commits April 2, 2025 21:34
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
Copy link

github-actions bot commented Apr 3, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions c,cpp -- compiler-rt/lib/builtins/aarch64/emupac.cpp compiler-rt/test/builtins/Unit/aarch64/emupac.c
View the diff from clang-format here.
diff --git a/compiler-rt/lib/builtins/aarch64/emupac.cpp b/compiler-rt/lib/builtins/aarch64/emupac.cpp
index 4ccd68f1c..772640fb7 100644
--- a/compiler-rt/lib/builtins/aarch64/emupac.cpp
+++ b/compiler-rt/lib/builtins/aarch64/emupac.cpp
@@ -110,7 +110,7 @@ __emupac_pacda_impl(uint64_t ptr, uint64_t disc) {
   }
   uint64_t hash;
   siphash<1, 3>(reinterpret_cast<uint8_t *>(&ptr), 8, emu_da_key,
-                *reinterpret_cast<uint8_t(*)[8]>(&hash));
+                *reinterpret_cast<uint8_t (*)[8]>(&hash));
   return (ptr & ~pac_mask) | (hash & pac_mask);
 }
 
@@ -129,7 +129,7 @@ __emupac_autda_impl(uint64_t ptr, uint64_t disc) {
       (ptr & ttbr1_mask) ? (ptr | pac_mask) : (ptr & ~pac_mask);
   uint64_t hash;
   siphash<1, 3>(reinterpret_cast<uint8_t *>(&ptr_without_pac), 8, emu_da_key,
-                *reinterpret_cast<uint8_t(*)[8]>(&hash));
+                *reinterpret_cast<uint8_t (*)[8]>(&hash));
   if (((ptr & ~pac_mask) | (hash & pac_mask)) != ptr) {
     __builtin_trap();
   }

@pcc
Copy link
Contributor Author

pcc commented Apr 3, 2025

We cannot introduce BSD-licensed code to compiler-rt. Please see my comment for more details.

Thanks, resolved by switching to SipHash. #134197 extracts the existing SipHash implementation into a header that can be included here.

pcc added a commit to pcc/llvm-project that referenced this pull request Apr 3, 2025
The emulated PAC runtime functions emulate the ARMv8.3a pointer
authentication instructions and are intended for use in heterogeneous
testing environments. For more information, see the associated RFC:
https://discourse.llvm.org/t/rfc-emulated-pac/85557

TODO:
- Add tests.
- Figure out how to get emupac.cpp to build with CMake. For some reason
  any .cpp files added to the builtins library are ignored. So for now
  only the gn build works.

Pull Request: llvm#133530
Created using spr 1.3.6-beta.1
pcc added a commit to pcc/llvm-project that referenced this pull request Apr 4, 2025
The emulated PAC runtime functions emulate the ARMv8.3a pointer
authentication instructions and are intended for use in heterogeneous
testing environments. For more information, see the associated RFC:
https://discourse.llvm.org/t/rfc-emulated-pac/85557

TODO:
- Add tests.

Pull Request: llvm#133530
pcc added 6 commits April 7, 2025 20:45
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
@llvm-beanz llvm-beanz self-requested a review April 28, 2025 01:08
pcc added 4 commits May 12, 2025 21:36
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
pcc added a commit to pcc/llvm-project that referenced this pull request May 24, 2025
The emulated PAC runtime functions emulate the ARMv8.3a pointer
authentication instructions and are intended for use in heterogeneous
testing environments. For more information, see the associated RFC:
https://discourse.llvm.org/t/rfc-emulated-pac/85557

Pull Request: llvm#133530
@atrosinenko
Copy link
Contributor

Since this PR enables another language (CXX in addition to C and ASM) in the builtins library, it is probably worth reviewing by someone familiar with CMake build descriptions used in LLVM.

pcc added 2 commits June 5, 2025 22:16
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
Copy link
Contributor Author

@pcc pcc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@petrhosek can you review this as the CMake maintainer?

I also ran the tests with GCC as the compiler which revealed that GCC does not support the naked attribute on functions on AArch64 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77882) causing a test failure. To fix the test failure I replaced the entry points with module-level inline asm.

0x4a, 0x79, 0x6f, 0xec, 0x8b, 0x1b,
0x42, 0x87, 0x81, 0xd4};

extern "C" __attribute__((flatten)) uint64_t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this file is C++, can we use the C++ (and C23) attribute syntax?

Suggested change
extern "C" __attribute__((flatten)) uint64_t
extern "C" [[gnu::flatten]] uint64_t

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@petrhosek petrhosek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake changes LGTM.

pcc added 4 commits July 8, 2025 21:36
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1

[skip ci]
Created using spr 1.3.6-beta.1
@pcc pcc changed the base branch from users/pcc/spr/main.compiler-rt-introduce-runtime-functions-for-emulated-pac to main July 9, 2025 23:18
@pcc pcc merged commit 5b1db59 into main Jul 9, 2025
9 of 15 checks passed
@pcc pcc deleted the users/pcc/spr/compiler-rt-introduce-runtime-functions-for-emulated-pac branch July 9, 2025 23:18
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jul 9, 2025
The emulated PAC runtime functions emulate the ARMv8.3a pointer
authentication instructions and are intended for use in heterogeneous
testing environments. For more information, see the associated RFC:
https://discourse.llvm.org/t/rfc-emulated-pac/85557

Reviewers: llvm-beanz, petrhosek

Pull Request: llvm/llvm-project#133530
pcc added a commit that referenced this pull request Jul 10, 2025
Attempt to fix these build failures:
https://lab.llvm.org/buildbot/#/builders/107/builds/12601

The suspected cause is that #133530 caused us to start
passing -std:c11 to MSVC, which activated this code path
that uses _Complex, which MSVC does not support. See:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/complex-math-support

Fix it by also checking _MSC_VER.
@mstorsjo
Copy link
Member

This seems to cause failures in running the compiler-rt tests on Windows on aarch64:

https://github.com/mstorsjo/llvm-mingw/actions/runs/16183738002/job/45702529351#step:8:1964

# executed command: not KillTheDoctor 'C:\a\llvm-mingw\llvm-mingw\llvm-project\build-compiler-rt\test\builtins\Unit\AARCH64WindowsConfig\aarch64\Output\emupac.c.tmp' 3
# note: command had no output on stdout or stderr
# error: command failed with exit status: 1

It doesn't seem to contain much other details about the issue, unless the compiler warnings about the size of long for printfs are related.

Separately, a build of LLVM on macOS, using the latest Clang+LLVM+compiler-rt, also seems to be failing now (but that's possibly related to another PR than this one):

https://github.com/mstorsjo/llvm-mingw/actions/runs/16183738002/job/45692829706

ld64.lld: error: undefined symbol for arch arm64: _emupac_pacda_impl
>>> referenced by /Users/runner/llvm/lib/clang/21/lib/darwin/libclang_rt.osx.a(emupac.cpp.o):(symbol _emupac_pacda+0x8)
>>> did you mean: __emupac_pacda_impl
>>> defined in: /Users/runner/llvm/lib/clang/21/lib/darwin/libclang_rt.osx.a(emupac.cpp.o)

@pawosm-arm
Copy link
Contributor

Turned my CI red,

11:46:21  Testing: 
11:46:21  FAIL: Builtins-aarch64-linux :: aarch64/emupac.c (543 of 93329)
11:46:21  ******************** TEST 'Builtins-aarch64-linux :: aarch64/emupac.c' FAILED ********************
11:46:21  Exit Code: 1
11:46:21  
11:46:21  Command Output (stderr):
11:46:21  --
11:46:21  /workspace/build/stage/product_build/./bin/clang   -gline-tables-only   -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta  -DCOMPILER_RT_HAS_FLOAT16  -fno-builtin -I /workspace/src/compiler-rt/lib/builtins -nodefaultlibs /workspace/src/compiler-rt/test/builtins/Unit/aarch64/emupac.c /workspace/build/stage/product_build/./lib/../lib/clang/21/lib/aarch64-unknown-linux-gnu/libclang_rt.builtins.a -lc -lm -o /workspace/build/stage/product_build/runtimes/runtimes-bins/compiler-rt/test/builtins/Unit/AARCH64LinuxConfig/aarch64/Output/emupac.c.tmp # RUN: at line 2
11:46:21  + /workspace/build/stage/product_build/./bin/clang -gline-tables-only -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta -DCOMPILER_RT_HAS_FLOAT16 -fno-builtin -I /workspace/src/compiler-rt/lib/builtins -nodefaultlibs /workspace/src/compiler-rt/test/builtins/Unit/aarch64/emupac.c /workspace/build/stage/product_build/./lib/../lib/clang/21/lib/aarch64-unknown-linux-gnu/libclang_rt.builtins.a -lc -lm -o /workspace/build/stage/product_build/runtimes/runtimes-bins/compiler-rt/test/builtins/Unit/AARCH64LinuxConfig/aarch64/Output/emupac.c.tmp
11:46:21  /workspace/build/stage/product_build/runtimes/runtimes-bins/compiler-rt/test/builtins/Unit/AARCH64LinuxConfig/aarch64/Output/emupac.c.tmp 1 # RUN: at line 3
11:46:21  + /workspace/build/stage/product_build/runtimes/runtimes-bins/compiler-rt/test/builtins/Unit/AARCH64LinuxConfig/aarch64/Output/emupac.c.tmp 1
11:46:21  /workspace/build/stage/product_build/runtimes/runtimes-bins/compiler-rt/test/builtins/Unit/AARCH64LinuxConfig/aarch64/Output/emupac.c.tmp 2 # RUN: at line 4
11:46:21  + /workspace/build/stage/product_build/runtimes/runtimes-bins/compiler-rt/test/builtins/Unit/AARCH64LinuxConfig/aarch64/Output/emupac.c.tmp 2
11:46:21  not --crash  /workspace/build/stage/product_build/runtimes/runtimes-bins/compiler-rt/test/builtins/Unit/AARCH64LinuxConfig/aarch64/Output/emupac.c.tmp 3 # RUN: at line 5
11:46:21  + not --crash /workspace/build/stage/product_build/runtimes/runtimes-bins/compiler-rt/test/builtins/Unit/AARCH64LinuxConfig/aarch64/Output/emupac.c.tmp 3

should a revert be considered?

@atrosinenko
Copy link
Contributor

If I get the log posted by @pawosm-arm right, this test case should crash but it doesn't:

  case 3: {
    // Test that a corrupted signature crashes the program.
    uint64_t signed_ptr = __emupac_pacda(ptr1, ptr2);
    __emupac_autda(signed_ptr + (1ULL << 48), ptr2);
    break;
  }

I guess the reason is that the CPU supports FEAT_PAuth (thus actual autda instruction is executed instead of the software emulation), but not FEAT_FPAC (so that __emupac_autda simply returns a corrupted pointer, which is never dereferenced). @pawosm-arm on what CPU model do you run the tests?

@mstorsjo
Copy link
Member

If I get the log posted by @pawosm-arm right, this test case should crash but it doesn't:

  case 3: {
    // Test that a corrupted signature crashes the program.
    uint64_t signed_ptr = __emupac_pacda(ptr1, ptr2);
    __emupac_autda(signed_ptr + (1ULL << 48), ptr2);
    break;
  }

I guess the reason is that the CPU supports FEAT_PAuth (thus actual autda instruction is executed instead of the software emulation), but not FEAT_FPAC (so that __emupac_autda simply returns a corrupted pointer, which is never dereferenced). @pawosm-arm on what CPU model do you run the tests?

Not sure about that case, but my case on Windows above also seems to error on running case 3. That is running on the free github actions runners, which AFAIK run "Azure Cobalt 100" processors, which are Neoverse N2 based. The feature detection on Windows might not set all the same feature flags as the feature detection on Linux though.

@pawosm-arm
Copy link
Contributor

If I get the log posted by @pawosm-arm right, this test case should crash but it doesn't:

  case 3: {
    // Test that a corrupted signature crashes the program.
    uint64_t signed_ptr = __emupac_pacda(ptr1, ptr2);
    __emupac_autda(signed_ptr + (1ULL << 48), ptr2);
    break;
  }

I guess the reason is that the CPU supports FEAT_PAuth (thus actual autda instruction is executed instead of the software emulation), but not FEAT_FPAC (so that __emupac_autda simply returns a corrupted pointer, which is never dereferenced). @pawosm-arm on what CPU model do you run the tests?

According to our CI maintainers, it's Neoverse V2.

@atrosinenko
Copy link
Contributor

Hmm, my guess seems to be wrong, as both Neoverse N2 and V2 support FEAT_FPAC, if I understand corresponding TRMs correcly.

The feature detection on Windows might not set all the same feature flags as the feature detection on Linux though.

@mstorsjo As far as I see, this patch does not ask the OS about supported CPU features, it just executes a backward-compatible XPACLRI instruction and checks whether it behaves as NOP or not.

@pcc
Copy link
Contributor Author

pcc commented Jul 10, 2025

The test failures could be for two reasons:

  • Machine supports FEAT_PAuth but not FEAT_FPAC (as previously stated). I'm only aware of Apple M1 and Cortex-A78C supporting FEAT_PAuth without FEAT_FPAC.
  • DA key is disabled (i.e. SCTLR_EL1.EnDA clear). I'm not sure how Windows controls whether PAC keys are disabled, but on Linux that could happen if the kernel is old (pre-5.0 which added PAC support) or the kernel was configured to disable PAC (CONFIG_ARM64_PTR_AUTH was unset in the kernel config).

The latter seems the more likely explanation if it's Neoverse V2.

Separately, a build of LLVM on macOS, using the latest Clang+LLVM+compiler-rt, also seems to be failing now (but that's possibly related to another PR than this one):

I think it's this PR again. IIRC Mach-O name mangling prepends a _ which also needs to be added in inline asm.

Let me revert this for now and bring it back with a fix for both of these issues.

@pcc
Copy link
Contributor Author

pcc commented Jul 10, 2025

Revert: 0c0aa56

@mstorsjo
Copy link
Member

  • DA key is disabled (i.e. SCTLR_EL1.EnDA clear). I'm not sure how Windows controls whether PAC keys are disabled, but on Linux that could happen if the kernel is old (pre-5.0 which added PAC support) or the kernel was configured to disable PAC (CONFIG_ARM64_PTR_AUTH was unset in the kernel config).

This is quite likely the reason on Windows as well - it wouldn't surprise me if PAC is universally disabled so far (at least on whatever version of Windows running on the github actions runners) even if the HW supports it.

Separately, a build of LLVM on macOS, using the latest Clang+LLVM+compiler-rt, also seems to be failing now (but that's possibly related to another PR than this one):

I think it's this PR again. IIRC Mach-O name mangling prepends a _ which also needs to be added in inline asm.

Yeah, it looks like a clear case of that.

I was surprised to see this issue crop up here on its own though, because I don't see why anything should be pulling in the new emupac object file from the builtins library; either the builtins library is being linked with -wholearchive or similar (which would be unusual), or something else is actually referencing a symbol from this new object file. (I thought some separate commit was making code generation actually refer to the _emupac symbols and pull it in that way, but that doesn't seem to be the case.)

Let me revert this for now and bring it back with a fix for both of these issues.

Thank you!

@pcc
Copy link
Contributor Author

pcc commented Jul 11, 2025

Proposed reland: #148094

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

9 participants